智能论文笔记

PVT-COV19D: Pyramid Vision Transformer for COVID-19 Diagnosis

Lilang Zheng , Jiaxuan Fang , Xiaorun Tang , Hanzhang Li , Jiaxin Fan , Tianyi Wang , Rui Zhou , Zhaoyan Yan

分类：计算机视觉

2022-06-30

随着Covid-19的爆发，近年来已经出现了大量相关研究。我们提出了一个基于肺CT扫描图像的自动COVID-19诊断框架，即PVT-COV19D。为了适应图像输入的不同维度，我们首先使用变压器模型对图像进行了分类，然后根据正常分布对数据集中进行采样，并将采样结果馈送到修改的PVTV2模型中以进行训练。COV19-CT-DB数据集上的大量实验证明了该方法的有效性。

translated by 谷歌翻译

Shape from Polarization for Complex Scenes in the Wild

Chenyang Lei , Chenyang Qi , Jiaxin Xie , Na Fan , Vladlen Koltun , Qifeng Chen

分类：计算机视觉

2021-12-21

我们介绍了一种新的数据驱动方法，具有基于物理的前沿，从单个偏振图像到场景级正常估计。来自偏振（SFP）的现有形状主要专注于估计单个物体的正常，而不是野外的复杂场景。高质量场景级SFP的关键障碍是复杂场景中缺乏现实世界的SFP数据。因此，我们贡献了第一个现实世界场景级SFP数据集，具有配对输入偏振图像和地理正常映射。然后，我们提出了一种基于学习的框架，具有多头自我注意模块和观察编码，该框架被设计为处理由场景级SFP中的复杂材料和非正交投影引起的增加的偏振模糊。由于偏振光和表面法线之间的关系不受距离的影响，我们训练的模型可以广泛地展开到远场户外场景。实验结果表明，我们的方法在两个数据集中显着优于现有的SFP模型。我们的数据集和源代码将公开可用于\ url {https://github.com/chenyanglei/sfp-wild}。

translated by 谷歌翻译

Gradient-based Novelty Detection Boosted by Self-supervised Binary Classification

Jingbo Sun , Li Yang , Jiaxin Zhang , Frank Liu , Mahantesh Halappanavar , Deliang Fan , Yu Cao

分类：机器学习

2021-12-18

新颖性检测旨在自动识别分销（OOD）数据，而无需任何先验知识。它是数据监视，行为分析和其他应用程序中的关键步骤，帮助在现场中保持不断学习。常规的OOD检测方法对数据或特征的集合进行多变化分析，通常诉诸于数据的监督，以提高准确性。实际上，这种监督是不切实际的，因为人们不能预料到异常数据。在本文中，我们提出了一种小说，自我监督的方法，不依赖于任何预定义的OOD数据：（1）新方法评估梯度之间的分布和OOD数据之间的Mahalanobis距离。（2）通过自我监督的二进制分类器辅助，以指导标签选择以生成梯度，并最大化Mahalanobis距离。在具有多个数据集的评估中，例如CiFar-10，CiFar-100，SVHN和TINIMAGENET，所提出的方法始终如一地优于接收器操作特征（AUROC）和区域下的区域内的最先进的监督和无监督的方法在精密召回曲线（AUPR）度量下。我们进一步证明，该探测器能够在持续学习中准确地学习一个OOD类。

translated by 谷歌翻译

POTATO: The Portable Text Annotation Tool

Jiaxin Pei , Aparna Ananthasubramaniam , Xingyao Wang , Naitian Zhou , Jackson Sargent , Apostolos Dedeloudis , David Jurgens

分类：自然语言处理 | 人工智能 | 机器学习

2022-12-16

We present POTATO, the Portable text annotation tool, a free, fully open-sourced annotation system that 1) supports labeling many types of text and multimodal data; 2) offers easy-to-configure features to maximize the productivity of both deployers and annotators (convenient templates for common ML/NLP tasks, active learning, keypress shortcuts, keyword highlights, tooltips); and 3) supports a high degree of customization (editable UI, inserting pre-screening questions, attention and qualification tests). Experiments over two annotation tasks suggest that POTATO improves labeling speed through its specially-designed productivity features, especially for long documents and complex tasks. POTATO is available at https://github.com/davidjurgens/potato and will continue to be updated.

translated by 谷歌翻译

MOPRD: A multidisciplinary open peer review dataset

Jialiang Lin , Jiaxin Song , Zhangping Zhou , Yidong Chen , Xiaodong Shi

分类：人工智能 | 自然语言处理 | 机器学习

2022-12-09

Open peer review is a growing trend in academic publications. Public access to peer review data can benefit both the academic and publishing communities. It also serves as a great support to studies on review comment generation and further to the realization of automated scholarly paper review. However, most of the existing peer review datasets do not provide data that cover the whole peer review process. Apart from this, their data are not diversified enough as they are mainly collected from the field of computer science. These two drawbacks of the currently available peer review datasets need to be addressed to unlock more opportunities for related studies. In response to this problem, we construct MOPRD, a multidisciplinary open peer review dataset. This dataset consists of paper metadata, multiple version manuscripts, review comments, meta-reviews, author's rebuttal letters, and editorial decisions. Moreover, we design a modular guided review comment generation method based on MOPRD. Experiments show that our method delivers better performance indicated by both automatic metrics and human evaluation. We also explore other potential applications of MOPRD, including meta-review generation, editorial decision prediction, author rebuttal generation, and scientometric analysis. MOPRD is a strong endorsement for further studies in peer review-related research and other applications.

translated by 谷歌翻译

Towards Accurate Ground Plane Normal Estimation from Ego-Motion

Jiaxin Zhang , Wei Sui , Qian Zhang , Tao Chen , Cong Yang

分类：计算机视觉 | 机器人

2022-12-08

In this paper, we introduce a novel approach for ground plane normal estimation of wheeled vehicles. In practice, the ground plane is dynamically changed due to braking and unstable road surface. As a result, the vehicle pose, especially the pitch angle, is oscillating from subtle to obvious. Thus, estimating ground plane normal is meaningful since it can be encoded to improve the robustness of various autonomous driving tasks (e.g., 3D object detection, road surface reconstruction, and trajectory planning). Our proposed method only uses odometry as input and estimates accurate ground plane normal vectors in real time. Particularly, it fully utilizes the underlying connection between the ego pose odometry (ego-motion) and its nearby ground plane. Built on that, an Invariant Extended Kalman Filter (IEKF) is designed to estimate the normal vector in the sensor's coordinate. Thus, our proposed method is simple yet efficient and supports both camera- and inertial-based odometry algorithms. Its usability and the marked improvement of robustness are validated through multiple experiments on public datasets. For instance, we achieve state-of-the-art accuracy on KITTI dataset with the estimated vector error of 0.39{\deg}. Our code is available at github.com/manymuch/ground_normal_filter.

translated by 谷歌翻译

G-MAP: General Memory-Augmented Pre-trained Language Model for Domain Tasks

Zhongwei Wan , Yichun Yin , Wei Zhang , Jiaxin Shi , Lifeng Shang , Guangyong Chen , Xin Jiang , Qun Liu

分类：自然语言处理

2022-12-07

Recently, domain-specific PLMs have been proposed to boost the task performance of specific domains (e.g., biomedical and computer science) by continuing to pre-train general PLMs with domain-specific corpora. However, this Domain-Adaptive Pre-Training (DAPT; Gururangan et al. (2020)) tends to forget the previous general knowledge acquired by general PLMs, which leads to a catastrophic forgetting phenomenon and sub-optimal performance. To alleviate this problem, we propose a new framework of General Memory Augmented Pre-trained Language Model (G-MAP), which augments the domain-specific PLM by a memory representation built from the frozen general PLM without losing any general knowledge. Specifically, we propose a new memory-augmented layer, and based on it, different augmented strategies are explored to build the memory representation and then adaptively fuse it into the domain-specific PLM. We demonstrate the effectiveness of G-MAP on various domains (biomedical and computer science publications, news, and reviews) and different kinds (text classification, QA, NER) of tasks, and the extensive results show that the proposed G-MAP can achieve SOTA results on all tasks.

translated by 谷歌翻译

Accelerating Inverse Learning via Intelligent Localization with Exploratory Sampling

Jiaxin Zhang , Sirui Bi , Victor Fung

分类：机器学习 | 人工智能

2022-12-02

In the scope of "AI for Science", solving inverse problems is a longstanding challenge in materials and drug discovery, where the goal is to determine the hidden structures given a set of desirable properties. Deep generative models are recently proposed to solve inverse problems, but these currently use expensive forward operators and struggle in precisely localizing the exact solutions and fully exploring the parameter spaces without missing solutions. In this work, we propose a novel approach (called iPage) to accelerate the inverse learning process by leveraging probabilistic inference from deep invertible models and deterministic optimization via fast gradient descent. Given a target property, the learned invertible model provides a posterior over the parameter space; we identify these posterior samples as an intelligent prior initialization which enables us to narrow down the search space. We then perform gradient descent to calibrate the inverse solutions within a local region. Meanwhile, a space-filling sampling is imposed on the latent space to better explore and capture all possible solutions. We evaluate our approach on three benchmark tasks and two created datasets with real-world applications from quantum chemistry and additive manufacturing, and find our method achieves superior performance compared to several state-of-the-art baseline methods. The iPage code is available at https://github.com/jxzhangjhu/MatDesINNe.

translated by 谷歌翻译

AutoCAD: Automatically Generating Counterfactuals for Mitigating Shortcut Learning

Jiaxin Wen , Yeshuang Zhu , Jinchao Zhang , Jie Zhou , Minlie Huang

分类：人工智能 | 自然语言处理

2022-11-29

Recent studies have shown the impressive efficacy of counterfactually augmented data (CAD) for reducing NLU models' reliance on spurious features and improving their generalizability. However, current methods still heavily rely on human efforts or task-specific designs to generate counterfactuals, thereby impeding CAD's applicability to a broad range of NLU tasks. In this paper, we present AutoCAD, a fully automatic and task-agnostic CAD generation framework. AutoCAD first leverages a classifier to unsupervisedly identify rationales as spans to be intervened, which disentangles spurious and causal features. Then, AutoCAD performs controllable generation enhanced by unlikelihood training to produce diverse counterfactuals. Extensive evaluations on multiple out-of-domain and challenge benchmarks demonstrate that AutoCAD consistently and significantly boosts the out-of-distribution performance of powerful pre-trained models across different NLU tasks, which is comparable or even better than previous state-of-the-art human-in-the-loop or task-specific CAD methods. The code is publicly available at https://github.com/thu-coai/AutoCAD.

translated by 谷歌翻译

High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization

Jiaxin Xie , Hao Ouyang , Jingtan Piao , Chenyang Lei , Qifeng Chen

分类：计算机视觉

2022-11-28

We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views while preserving specific details of the input image. High-fidelity 3D GAN inversion is inherently challenging due to the geometry-texture trade-off in 3D inversion, where overfitting to a single view input image often damages the estimated geometry during the latent optimization. To solve this challenge, we propose a novel pipeline that builds on the pseudo-multi-view estimation with visibility analysis. We keep the original textures for the visible parts and utilize generative priors for the occluded parts. Extensive experiments show that our approach achieves advantageous reconstruction and novel view synthesis quality over state-of-the-art methods, even for images with out-of-distribution textures. The proposed pipeline also enables image attribute editing with the inverted latent code and 3D-aware texture modification. Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.

translated by 谷歌翻译